Optimize First, Buy Later: Analyzing Metrics to Ramp-Up Very Large Knowledge Bases

نویسندگان

  • Paea LePendu
  • Natalya Fridman Noy
  • Clement Jonquet
  • Paul R. Alexander
  • Nigam H. Shah
  • Mark A. Musen
چکیده

As knowledge bases move into the landscape of larger ontologies and have terabytes of related data, we must work on optimizing the performance of our tools. We are easily tempted to buy bigger machines or to fill rooms with armies of little ones to address the scalability problem. Yet, careful analysis and evaluation of the characteristics of our data—using metrics—often leads to dramatic improvements in performance. Firstly, are current scalable systems scalable enough? We found that for large or deep ontologies (some as large as 500,000 classes) it is hard to say because benchmarks obscure the load-time costs for materialization. Therefore, to expose those costs, we have synthesized a set of more representative ontologies. Secondly, in designing for scalability, how do we manage knowledge over time? By optimizing for data distribution and ontology evolution, we have reduced the population time, including materialization, for the NCBO Resource Index, a knowledge base of 16.4 billion annotations linking 2.4 million terms from 200 ontologies to 3.5 million data elements, from one week to less than one hour for one of the large datasets on the same machine.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Accurate Modeling of Delay and Slew Metrics for On-chip Vlsi Rc Interconnects for Ramp Inputs Using Burr’s Distribution Function

This work presents an accurate and efficient model to compute the delay and slew metric of on-chip interconnect of high speed CMOS circuits foe ramp input. Our metric assumption is based on the Burr’s Distribution function. The Burr’s distribution is used to characterize the normalized homogeneous portion of the step response. We used the PERI (Probability distribution function Extension for Ra...

متن کامل

An Explicit Model of Delay and Slew Metric for On-Chip VLSI RC Interconnects for Ramp Inputs using Gamma Distribution Function

Moments of the impulse response are widely used for interconnect delay analysis, from the explicit Elmore delay (the first moment of the impulse response) expression, to moment matching methods which creates reduced order trans-impedance and transfer function approximations. However, the Elmore delay is fast becoming ineffective for deep submicron technologies, and reduced order transfer functi...

متن کامل

A goal-question-metrics model for configuration knowledge bases

Configuration knowledge bases are a wellestablished technology for describing configurable products like cars, computers, and financial services. Such knowledge bases are characterized by sets of constraints, variables, and domains. Lot of research has been done for testing knowledge bases, finding conflicts, and recommending repair actions. In contrast, less work has been done in the area of m...

متن کامل

ANALYTICAL MODELING OF DELAY AND SLEW FOR On-Chip VLSI RC GLOBAL INTERCONNECT USING THREE CIRCUIT MOMENTS

In high speed digital integrated circuits, interconnects delay can be significant and should be included for accurate analysis. Delay analysis for interconnect has been done widely by using moments of the impulse response, from the explicit Elmore delay (the first moment of the impulse response) expression, to moment matching methods which creates reduced order trans impedance and transfer func...

متن کامل

Application of Random Amplified Microsatellite Polymorphism (RAMP) in Prunus Characterization and Mapping

Random amplified microsatellite polymorphism (RAMP) is a PCR-based marker which uses a combination of two classes of markers: Simple sequence repeat (SSR) and Random amplified DNA polymorphism (RAPD) markers. RAMP has been demonstrated to be a potentially valuable molecular marker for the study of genetic relationships in cultivated plant species. The objective of this study was to optimize the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010